7 research outputs found

    Advancing Data-Efficiency in Reinforcement Learning

    Get PDF
    In many real-world applications, including traffic control, robotics and web system configurations, we are confronted with real-time decision-making problems where data is limited. Reinforcement Learning (RL) allows us to construct a mathematical framework to solve sequential decision-making problems under uncertainty. Under low-data constraints, RL agents must be able to quickly identify relevant information in the observations, and to quickly learn how to act in order attain their long-term objective. While recent advancements in RL have demonstrated impressive achievements, the end-to-end approach they take favours autonomy and flexibility at the expense of fast learning. To be of practical use, there is an undeniable need to improve the data-efficiency of existing systems. Ideal RL agents would possess an optimal way of representing their environment, combined with an efficient mechanism for propagating reward signals across the state space. This thesis investigates the problem of data-efficiency in RL from these two aforementioned perspectives. A deep overview of the different representation learning methods in use in RL is provided. The aim of this overview is to categorise the different representation learning approaches and highlight the impact of the representation on data-efficiency. Then, this framing is used to develop two main research directions. The first problem focuses on learning a representation that captures the geometry of the problem. An RL mechanism that uses a scalable feature learning on graphs method to learn such rich representations is introduced, ultimately leading to more efficient value function approximation. Secondly, ET (λ ), an algorithm that improves credit assignment in stochastic environments by propagating reward information counterfactually is presented. ET (λ ) results in faster earning compared to traditional methods that rely solely on temporal credit assignment. Overall, this thesis shows how a structural representation encoding the geometry of the state space, and counterfactual credit assignment are key characteristics for data-efficient RL

    Representation Learning on Graphs: A Reinforcement Learning Application

    Full text link
    In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved low-dimensional value function approximation. Then, we adopt different representation learning algorithm on graphs to learn the basis functions that best represent the value function. We empirically show that node2vec, an algorithm for scalable feature learning in networks, and the Variational Graph Auto-Encoder constantly outperform the commonly used smooth proto-value functions in low-dimensional feature space

    Evaluation of data processing pipelines on real-world electronic health records data for the purpose of measuring patient similarity

    Get PDF
    BACKGROUND: The ever-growing size, breadth, and availability of patient data allows for a wide variety of clinical features to serve as inputs for phenotype discovery using cluster analysis. Data of mixed types in particular are not straightforward to combine into a single feature vector, and techniques used to address this can be biased towards certain data types in ways that are not immediately obvious or intended. In this context, the process of constructing clinically meaningful patient representations from complex datasets has not been systematically evaluated. AIMS: Our aim was to a) outline and b) implement an analytical framework to evaluate distinct methods of constructing patient representations from routine electronic health record data for the purpose of measuring patient similarity. We applied the analysis on a patient cohort diagnosed with chronic obstructive pulmonary disease. METHODS: Using data from the CALIBER data resource, we extracted clinically relevant features for a cohort of patients diagnosed with chronic obstructive pulmonary disease. We used four different data processing pipelines to construct lower dimensional patient representations from which we calculated patient similarity scores. We described the resulting representations, ranked the influence of each individual feature on patient similarity and evaluated the effect of different pipelines on clustering outcomes. Experts evaluated the resulting representations by rating the clinical relevance of similar patient suggestions with regard to a reference patient. RESULTS: Each of the four pipelines resulted in similarity scores primarily driven by a unique set of features. It was demonstrated that data transformations according to each pipeline prior to clustering can result in a variation of clustering results of over 40%. The most appropriate pipeline was selected on the basis of feature ranking and clinical expertise. There was moderate agreement between clinicians as measured by Cohen's kappa coefficient. CONCLUSIONS: Data transformation has downstream and unforeseen consequences in cluster analysis. Rather than viewing this process as a black box, we have shown ways to quantitatively and qualitatively evaluate and select the appropriate preprocessing pipeline

    Expected Eligibility Traces

    Get PDF
    The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that could also have led to the current state. In this work, we introduce expected eligibility traces. Expected traces allow, with a single update, to update states and actions that could have preceded the current state, even if they did not do so on this occasion. We discuss when expected traces provide benefits over classic (instantaneous) traces in temporal-difference learning, and show that sometimes substantial improvements can be attained. We provide a way to smoothly interpolate between instantaneous and expected traces by a mechanism similar to bootstrapping, which ensures that the resulting algorithm is a strict generalisation of TD(λ\lambda). Finally, we discuss possible extensions and connections to related ideas, such as successor features.Comment: AAAI, distinguished paper awar

    Evaluation of data processing pipelines on real-world electronic health records data for the purpose of measuring patient similarity.

    No full text
    BackgroundThe ever-growing size, breadth, and availability of patient data allows for a wide variety of clinical features to serve as inputs for phenotype discovery using cluster analysis. Data of mixed types in particular are not straightforward to combine into a single feature vector, and techniques used to address this can be biased towards certain data types in ways that are not immediately obvious or intended. In this context, the process of constructing clinically meaningful patient representations from complex datasets has not been systematically evaluated.AimsOur aim was to a) outline and b) implement an analytical framework to evaluate distinct methods of constructing patient representations from routine electronic health record data for the purpose of measuring patient similarity. We applied the analysis on a patient cohort diagnosed with chronic obstructive pulmonary disease.MethodsUsing data from the CALIBER data resource, we extracted clinically relevant features for a cohort of patients diagnosed with chronic obstructive pulmonary disease. We used four different data processing pipelines to construct lower dimensional patient representations from which we calculated patient similarity scores. We described the resulting representations, ranked the influence of each individual feature on patient similarity and evaluated the effect of different pipelines on clustering outcomes. Experts evaluated the resulting representations by rating the clinical relevance of similar patient suggestions with regard to a reference patient.ResultsEach of the four pipelines resulted in similarity scores primarily driven by a unique set of features. It was demonstrated that data transformations according to each pipeline prior to clustering can result in a variation of clustering results of over 40%. The most appropriate pipeline was selected on the basis of feature ranking and clinical expertise. There was moderate agreement between clinicians as measured by Cohen's kappa coefficient.ConclusionsData transformation has downstream and unforeseen consequences in cluster analysis. Rather than viewing this process as a black box, we have shown ways to quantitatively and qualitatively evaluate and select the appropriate preprocessing pipeline
    corecore